Analysis of Different Techniques for Segmentation of Connected Handwritten Indian Scripts Words

نویسندگان

  • Dharam Veer Sharma
  • Supreet Kaur
چکیده

In OCR processing text is extracted from an image. Here many common imperfections occur, like the characters that are connected together are returned as a single sub-image containing both characters. So this is the major challenge in the recognition process and we need to segment the connected words to get correct results. In segmentation process the first task is to find out the segmentation point. Once this is done the next step is to fragment the image from that point and get two separate sub-images for each character. There are many algorithms proposed for this process. One class of approaches uses the straight segmentation technique and segments the words based on the vertical and horizontal histogram profiles. Another approach uses contour features of the component for segmentation. Some researchers segments the characters by recognizing the profile features of touching character and the segmenting them. This technique is commonly known as recognition based segmentation or cut classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Extraction of Characters and Modifiers from Handwritten Gujarati Words

The research activity related to Optical Character Recognition (OCR) for almost all Indian languages is very less. Gujarati script is one of the scripts for which very less literature is available, as far as OCR activities are concerned. This paper describes one of the important phase of OCR, segmentation of handwritten words into its basic components namely basic characters, conjunct character...

متن کامل

Component-based Segmentation of Words from Handwritten Arabic Text

Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words se...

متن کامل

Handwritten Segmentation in Bangla Script: A Review of Offline Techniques

Offline handwritten segmentation in Bangla is an interesting area of research as Segmentation has long been one of the most critical areas of optical character recognition process. Through this operation, an image of a sequence of characters, which may be connected in some cases, is decomposed into sub-images of individual alphabetic symbols. In this paper, segmentation of cursive handwritten s...

متن کامل

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011